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ABSTRACT 

As product prices become increasingly available on the 
World Wide Web, consumers attempt to understand how 
corporations vary these prices over time. However, corpora- 
tions change prices based on proprietary algorithms and hid- 
den variables (e.g., the number of unsold seats on a flight). 
Is it possible to develop data mining techniques that will 
enable consumers to predict price changes under these con- 
ditions? 

This paper reports on a pilot study in the domain of air- 
line ticket prices where we recorded over 12,000 price obser- 
vations over a 41 day period. When trained on this data, 
Hamlet — our multi-strategy data mining algorithm — gen- 
erated a predictive model that saved 341 simulated passen- 
gers $198,074 by advising them when to buy and when to 
postpone ticket purchases. Remarkably, a clairvoyant algo- 
rithm with complete knowledge of future prices could save 
at most $320,572 in our simulation, thus HAMLET's savings 
were 61.8% of optimal. The algorithm's savings of $198,074 
represents an average savings of 23.8% for the 341 passen- 
gers for whom savings are possible. Overall, Hamlet saved 
4.4% of the ticket price averaged over the entire set of 4,488 
simulated passengers. Our pilot study suggests that mining 
of price data available over the web has the potential to save 
consumers substantial sums of money per annum. 



Categories and Subject Descriptors 

1.2.6 [Artificial Intelligence]: Learning 
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1. INTRODUCTION AND MOTIVATION 

Corporations often use complex policies to vary product 
prices over time. The airline industry is one of the most 
sophisticated in its use of dynamic pricing strategies in an 
attempt to maximize its revenue. Airlines have many fare 
classes for seats on the same flight, use different sales chan- 
nels (e.g., travel agents, priceline.com, consolidators), and 
frequently vary the price per seat over time based on a slew 
of factors including seasonality, availability of seats, compet- 
itive moves by other airlines, and more. The airlines are said 
to use proprietary software to compute ticket prices on any 
given day, but the algorithms used are jealously guarded 
trade secrets [19]. Hotels, rental car agencies, and other 
vendors with a "standing" inventory are increasingly using 
similar techniques. 

As product prices become increasingly available on the 
World Wide Web, consumers have the opportunity to be- 
come more sophisticated shoppers. They are able to com- 
parison shop efficiently and to track prices over time; they 
can attempt to identify pricing patterns and rush or delay 
purchases based on anticipated price changes (e.g., "I'll wait 
to buy because they always have a big sale in the spring..."). 
In this paper we describe the use of data mining methods to 
help consumers with this task. We report on a pilot study 
in the domain of airfares where an automatically learned 
model, based on price information available on the Web, 
was able to save consumers a substantial sum of money in 
simulation. 

The paper addresses the following central questions: 

• What is the behavior of airline ticket prices 
over time? Do airfares change frequently? Do they 
move in small increments or in large jumps? Do they 
tend to go up or down over time? Our pilot study en- 
ables us to begin to characterize the complex behavior 
of airfares. 



• What data mining methods are able to detect 
patterns in price data? In this paper we consider 
reinforcement learning, rule learning, time series meth- 
ods, and combinations of the above. 

• Can Web price tracking coupled with data min- 
ing save consumers money in practice? Vendors 
vary prices based on numerous variables whose values 
are not available on the Web. For example, an air- 
line may discount seats on a flight if the number of 
unsold seats, on a particular date; is high relative to 
the airline's model. However, consumers do not have 
access to the airline's model or to the number of avail- 
able seats on the flights. Thus, a priori, price changes 
could appear to be unpredictable to a consumer track- 
ing prices over the Web. In fact, we have found price 
changes to be surprisingly predictable in some cases. 

The remainder of this paper is organized as follows. Sec- 
tion 2 describes our data collection mechanism and analyzes 
the basic characteristics of airline pricing in our data. Sec- 
tion 3 considers related work in the areas of computational 
finance and time series analysis. Section 4 introduces our 
data mining methods and describes how each method was 
tailored to our domain. We investigated rule learning [8], Q- 
learning [25], moving average models [13], and the combina- 
tion of these methods via stacked generalization [28]. Next, 
Section 5 describes our simulation and the performance of 
each of the methods on our test data. The section also re- 
ports on a sensitivity analysis to assess the robustness of 
our results to changes in the simulation. We conclude with 
a discussion of future work and a summary of the paper's 
contributions. 

2. DATA COLLECTION 

We collected airfare data directly from a major travel web 
site. In order to extract the large amount of data required 
for our machine learning algorithms, we built a flight data 
collection agent that runs at a scheduled interval, extracts 
the pricing data, and stores the result in a database. 

We built our flight data collection agent using Agent- 
Builder 1 for wrapping web sites and Theseus for executing 
the agent [3]. AgentBuilder exploits machine learning tech- 
nology [15] that enables the system to automatically learn 
extraction rules that reliably convert information presented 
on web pages into XML. Once the system has learned the 
extraction rules, AgentBuilder compiles this into a Theseus 
plan. Theseus is a streaming dataflow execution system that 
supports highly optimized execution of a plan in a network 
environment. The system maximizes the parallelism across 
different operators and streams data between operations to 
support the efficient execution of plans with complex navi- 
gation paths and extraction from multiple pages. 

For the purpose of our pilot study, we restricted our- 
selves to collecting data on non-stop, round-trip flights for 
two routes: Los Angeles (LAX) to Boston (BOS) and Seat- 
tle (SEA) to Washington, DC (IAD). Our departure dates 
spanned January 2003 with the return flight 7 days after de- 
parture. For each departure date, we began collecting pric- 
ing data 21 days in advance at three- hour intervals; data 
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for each departure date was collected 8 times a day. 2 Over- 
all, we collected over 12,000 fare observations over a 41 day 
period for six different airlines including American, United, 
etc. We used three-hour intervals to limit the number of 
http requests to the web site. For each flight, we recorded 
the lowest fare available for an economy ticket. We also 
recorded when economy tickets were no longer available; we 
refer to such flights as sold out 

2.1 Pricing Behavior in Our Data 

We found that the price of tickets on a particular flight 
can change as often as seven times in a single day. We cate- 
gorize price change into two types: dependent price changes 
and independent price changes. Dependent changes occur 
when prices of similar flights (i.e. having the same origin 
and destination) from the same airline change at the same 
time. This type of change can happen as often as once or 
twice a day when airlines adjust their prices to maximize 
their overall revenue or "yield" . Independent changes occur 
when the price of a particular flight changes independently 
of similar flights from the same airline. We speculate that 
this type of change results from the change in the seat avail- 
ability of the particular flight. Table 1 shows the average 
number of changes per flight aggregated over all airlines for 
each route. Overall, 762 price changes occurred across all 
the flights in our data. 63% of the changes can be classified 
as dependent changes based on the behavior of other flights 
by the same airline. 



Route 


Avg. number of price changes 


LAX-BOS 
SEA-IAD 


6.8 
5.4 



Table 1: Average number of price changes per route. 

We found that the round-trip ticket price for flights can 
vary significantly over time. Table 2 shows the minimum 
price, maximum price, and the maximum difference in prices 
that can occur for flights on each route. 



Route 


Min Price 


Max Price 


Max Price Change 


LAX-BOS 
SEA-IAD 


275 
281 


2524 
1668 


2249 
1387 



Table 2: Minimum price, maximum price, and max- 
imum change in ticket price per route. All prices in 
this paper refer to the lowest economy airfare avail- 
able for purchase. 



For many flights there are easily discernible price tiers 
where ticket prices fall into a relatively small price range. 
The number of tiers typically varies from two to four, de- 
pending on the airline and the particular flight. Even flights 
from the same airline with the same schedule (but with dif- 
ferent departure dates) can have different numbers of tiers. 
For example, there are two price tiers for the flight in Figure 
1, four price tiers in Figure 4 and three price tiers in Figure 
2 and Figure 3. 

2 We expected to record 168 (21 * 8) price observations for 
each flight. In fact, we found that on average each flight was 
missing 25 observations due to problems during data collec- 
tion including remote server failures, site changes, wrapper 
bugs, etc. 
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Figure 1: Price change over time for United Air- 
lines roundtrip flight#168:169 LAX-BOS departing 
on Jan 12. This figure is an example of two price 
tiers and how consumers might benefit from the 
price drop. 
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Figure 2: Price change over time for American Air- 
lines roundtrip flight#192:223, LAX-BOS departing 
on Jan 2. This figure shows an example of rapid 
price fluctuation in the days priori to the New Year. 



Price matching plays an important role in airline pricing 
structure. Airlines use sophisticated software to track their 
competitors' pricing history and propose adjustments that 
optimize their overall revenue. To change the price, airlines 



2750 -i 

2250 , 




250 — i 1— 1 1 ■ \ 

12/8/2002 12/13/2002 12/18/2002 12/23/2002 12/28/2002 1/22003 1/7/2003 
Data 

Figure 3: Price change over time for American Air- 
lines roundtrip flight#192:223, LAX-BOS departing 
on Jan 7. This figure shows an example of three 
price tiers and low price fluctuation 



Figure 4: Price change over time for Alaska Airlines 
roundtrip flight#6:3, SEA-IAD departing on Jan 4. 
This figure shows an example of four price tiers. 

need to submit the change to the Airline Tariff Publish- 
ing Company (ATPCO), 3 the organization formed by leading 
airlines around the world that collects and distributes airline 
pricing data. The whole process of detecting competitors' 
fare changes, deciding whether or not to match competitors' 
prices, and submitting the price update at ATPCO can take 
up to one day [19]. 

Price changes appear to be fairly orderly on some flights 
(e.g., Figure 3), and we see evidence of the well-known 7 
and 14 day "advance purchase" fares. However, we also 
see plenty of surprising price changes. For example, flights 
that depart around holidays appear to fluctuate more (e.g., 
Figure 2. Figure 2 and Figure 3 show how pricing strategies 
differ between two nights from American Airlines that have 
the same schedule but fly on different dates. Figure 2 shows 
a flight that departs around the new year, while Figure 3 
shows the flight that departs one week after the first flight. 
Both flights have the tier structure that we described earlier 
in this section, but ticket prices in the first flight fluctuate 
more often. 

In terms of pricing strategy, we can divide the airlines 
into two categories. The first category covers airlines that 
are big players in the industry, such as United Airlines, and 
American Airlines. The second category covers smaller air- 
lines that concentrate on selling low-price tickets, such as 
Air Trans and Southwest. We have found that pricing poli- 
cies tend to be similar for airlines that belong to the same 
category. Fares for airlines in the first category are expen- 
sive and fluctuate often, while fares for airlines in the second 
category are moderate and appear relatively stable. How- 
ever, there are some policies that every airline seems to use. 
For example, airlines usually increase ticket prices two weeks 
before departure dates and ticket prices are at a maximum 
on departure dates. 

3. RELATED WORK 

Previous work in the AI community on the problem of 
predicting product prices over time has been limited to the 
Trading Agent Competition (TAC) [27], In 2002, TAC fo- 
cused on the travel domain. TAC relies on a simulator of 
airline, hotel, and ticket prices and the competitors build 
agents to bid on these. The problem is different from ours 
since the competition works as an auction (similar to Price- 

3 see http : //www . atpco .net. 
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line.com). Whereas we gathered actual flight price data from 
the web, TAC simulates flight prices using a stochastic pro- 
cess that follows a random walk with an increasingly upward 
bias. Also, the TAC auction of airline tickets assumes that 
the supply of airline tickets is unlimited. Several TAC com- 
petitors have explored a range of methods for price predic- 
tion including historical averaging, neural nets, and boost- 
ing. It is difficult to know how these methods would perform 
if reconfigured for our price mining task. 

There has been some recent interest in temporal data min- 
ing (see [23] for a survey). However, the problems studied 
under this heading are often quite different from our own 
(e.g., [1]). There has also been algorithmic work on time se- 
ries methods within the data mining community (e.g., [4]). 
We discuss time series methods below. 

Problems that are closely related to price prediction over 
time have been studied in statistics under the heading of 
"time series analysis" [7, 13, 9] and in computational fi- 
nance [20, 22, 21] under the heading of "optimal stopping 
problems 55 . However, these techniques have not been used 
to predict price changes for consumer goods based on data 
available over the web. Moreover, we combine these tech- 
niques with rule learning techniques to improve their per- 
formance. 

Computational finance is concerned with predicting prices 
and making buying decisions in markets for stock, options, 
and commodities. Prices in such markets are not determined 
by a hidden algorithm, as in the product pricing case, but 
rather by supply and demand as determined by the actions 
of a large number of buyers and sellers. Thus, for example, 
stock prices tend to move in small incremental steps rather 
than in the large, tiered jumps observed in the airline data. 

Nevertheless, there are well known problems in options 
trading that are related to ours. First, there is the early ex- 
ercise of American Calls on stocks that pay dividends. The 
second problem is the exercise of American Puts on stocks 
that don't pay dividends. These problems are described in 
sections 11.12 and 7.6 respectively of [14]. In both cases, 
there may be a time before the expiration of an option at 
which its exercise is optimal. Reinforcement learning meth- 
ods have been applied to both problems, and that is one 
reason we consider reinforcement learning for our problem. 

Time series analysis is a large body of statistical tech- 
niques that apply to a sequence of values of a variable that 
varies over time due to some underlying process or structure 
[7, 13, 9]. The observations of product prices over time are 
naturally viewed as time series data. Standard data mining 
techniques are "trained' 5 on a set of data to produce a pre- 
dictive model based on that data, which is then tested on a 
separate set of test data. In contrast, time series techniques 
would attempt to predict the value of a variable based on 
its own history. For example, our moving average model at- 
tempts to predict the future changes in the price of a ticket 
on a flight from that flight's own price history. 

There is also significant interest in bidding and pricing 
strategies for online auctions. For example, in [24] Harshit et 
al. use cluster analysis techniques to categorize the bidding 
strategies being used by the bidders. And in [17], Lucking- 
Reiley et al. explore the various factors that determine the 
final price paid in an online auction, such as the length of 
the auction, whether there is a reserve price, and the reputa- 
tion of the seller. However, these techniques are not readily 
applicable to our price mining problem. 



Comparison shopping "hots 55 gather price data available 
on the web for a wide range of products. 4 These are de- 
scendants of the Shopbot [11] which automatically learned 
to extract product and price information from online mer- 
chants' web sites. None of these services attempts to analyze 
and predict the behavior of product prices over time. Thus, 
the data mining methods in this paper complement the body 
of work on shopbots. 

4. DATA MINING METHODS 

In this section we explain how we generated training data, 
and then describe the various data mining methods we in- 
vestigated: Ripper [8], Q-learning [25], and time series [13, 
9]. We then explain how our data mining algorithm, Ham- 
let, combines the results of these methods using a variant 
of stacked generalization [26, 28]. 

Our data consists of price observations recorded every 3 
hours over a 41 day period. Our goal is to learn whether to 
buy a ticket or wait at a particular time point, for a particu- 
lar flight, given the price history that we have recorded. All 
of our experiments enforce the following essential temporal 
constraint: all the information used to make a decision at 
particular time point was recorded before that time point. 
In this way, we ensure that we rely on the past to predict 
the future, but not vice versa. 

4.1 Rule Learning 

Our first step was to run the popular Ripper rule learning 
system [8] on our training data. Ripper is an efficient sep- 
arate and conquer rule learner. We represented each price 
observation to Ripper as a vector of the following features: 

• Flight number. 

• Number of hours until departure (denoted as hours- 
before-takeoff). 

• Current price. 

• Airline. 

• Route (LAX-BOS or SEA-IAD). 

The class labels on each training instance were 'buy' or 
'wait 1 . 

We considered a host of additional features derived from 
the data, but they did not improve Ripper's performance. 
We did not represent key variables like the number of unsold 
seats on a flight, whether an airline is running a promotion, 
or seasonal variables because Hamlet did not have access 
to this information. However, see Section 6 for a discussion 
of how Hamlet might be able to obtain this information in 
the future. 

Some sample rules generated by Ripper are shown in Fig- 
ure 5. 

In our domain, classification accuracy is not the best met- 
ric to optimize because the cost of misclassified examples 
is highly variable. For example, inisclassifying a single ex- 
ample can cost from nothing to upwards of $2,000. Meta- 
Cost [10] is a well-known general method for training cost- 
sensitive classifiers. In our domain, MetaCost will make a 
learned classifier either more conservative or more aggres- 
sive about waiting for a better price, depending on the cost 

4 See, for example, froogle.google.com and mysimon.com. 
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IF hours-before-takeoff >= 252 AND price >= 2223 
AND route = LAX-BOS THEN wait 

IF airline = United AND price >= 360 

AND hours-before-takeoff >= 438 THEN wait 

Figure 5: Sample Ripper rules. 

of misclassifying a l buy' as a 'wait' compared with the cost 
of rnisclassifyiiig a l wait' as a 'buy'. We implemented Meta- 
Cost with mixed results. 

We found that MetaCost improves Ripper's performance 
by 14 percent, but that MetaCost hurts Hamlet's overall 
performance by 29 percent. As a result, we did not use 
MetaCost in Hamlet. 

4.2 Q-learning 

As our next step we considered Q-learning, a species of 
reinforcement learning [25]. Reinforcement learning seems 
like a natural fit because after making each new price ob- 
servation HAMLET has to decide whether to buy or to wait. 
Yet the reward (or penalty) associated with the decision is 
only determined later, when Hamlet determines whether it 
saved or lost money through its buying policy. Reinforce- 
ment learning is also a popular technique in computational 
finance [20, 22, 21]. 

The standard Q-learning formula is: 

Q(a,s) = R(s,a) + jmax a t(Q(a\ s')) 

Here, a) is the immediate reward, 7 is the discount 
factor for future rewards, and s f is the state resulting from 
taking action a in state s. We use the notion of state to 
model the state of the world after each price observation 
(represented by the price, flight number, departure date, 
and number of hours prior to takeoff) . Thus, there are two 
possible actions in each state: b for 'buy' and w for 'wait'. 

Of course, the particular reward function used is critical 
to the success (or failure) of Q-learning. In our study, the 
reward associated with b is the negative of the ticket price 
at that state, and the state resulting from b is a terminal 
state so there is no future reward. The immediate reward 
associated with w is zero as long as economy tickets on the 
flight do not sell out in the next time step. We set 7 = 1, 
so we do not discount future rewards. 

To discourage the algorithm from learning a model that 
waits until flights sell out, we introduce a "penalty" for such 
flights in the reward function. Specifically, in the case where 
the flight does sell out at the next time point, we make 
the immediate reward for waiting a negative constant whose 
absolute value is substantially greater than the price for any 
flight. We set the reward for reaching a sold-out state to be 
—300, 000. This setting can best be explained below, after 
we introduce a notion of equivalence classes among states. 

In short, we define the Q function by 

Q(6, s) = —price(s) 

nf . J -300000 if flight sells out after s. 

- l ma x(Q(b t 3%Q(w t s')) otherwise. 

To generalize from the training data we used a variant 
of the averaging step described in [18]. More specifically, 
we defined an equivalence class over states, which enabled 
the algorithm to train on a limited set of observations of the 



class and then use the learned model to generate predictions 
for other states in the class. 

To define our equivalence class we need to introduce some 
notation. Airlines typically use the same flight number (e.g., 
UA 168) to refer to multiple flights with the same route 
that depart at the same time on different dates. Thus, 
United flight 168 departs once daily from LAX to Boston at 
10:15pm. We refer to a particular flight by a combination of 
its flight number and date. For example, UA168-Jan7 refers 
to flight 168 which departs on January 7th, 2003. Since we 
observe the price of each flight eight times in every 24 hour 
period, there are many price observations for each flight. We 
distinguish among them by recording the time (number of 
hours) until the flight departs. Thus, UA168-Jan7-120 is the 
price observation for flight UA168, departing on January 7, 
which was recorded on January 2nd (120 hours before the 
flight departs on the 7th). Our equivalence class is the set of 
states with the same flight number and the same hours be- 
fore takeoff, but different departure dates. Thus, the states 
denoted UA168-Jan7-120 and UA168-Janl0-I20 are in the 
same equivalence class, but the state UA168-Jan7-117 is not. 
We denote that s and s* are in the same equivalence class 
by s ~ s* . 

Thus, our revised Q-learning formula is: 

Q(a, s) = Avy s * ~ 3 {R{s* , a) + max a t(Q(a ,s'))) 

The reason for choosing -300,000 is now more apparent: 
the large penalty can tilt the average toward a low value, 
even when many Q values are being averaged together. Sup- 
pose, for example, that there are ten training examples in 
the same equivalence class, and each has a current price of 
$2,500. Suppose now that in nine of the ten examples the 
price drops to $2,000 at some point in the future, but the 
flight in the tenth example sells out in the next state. The Q 
value for waiting in any state in this equivalence class will be 
(-300, 000-2, 000*9)/10 = -31, 800, or still much less then 
the Q value for any equivalence class where no flight sells 
out in the next state. Thus the choice of reward for a flight 
that sells out will determine how willing the Q-Learning al- 
gorithm will be to risk waiting when there's a chance a flight 
may sell out. Using a hill climbing search in the space of 
penalties, we found -300,000 to be locally optimal. 

Q- learning can be very slow, but we were able to ex- 
ploit the structure of the problem and the close relationship 
between dynamic programming and reinforcement learning 
(see [25]) to complete the learning in one pass over the train- 
ing set. Specifically, the reinforcement learning problem we 
face has a particularly nice structure, in which the value 
of Q(b, s) depends only on the price in state s, and the 
value of Q(W) s) depends only on the Q values of exactly 
one other state: the state containing the same flight num- 
ber and departure date but with three hours less time left 
until departure. Applying dynamic programming is thus 
straightforward, and the initial training step requires only 
a single pass over the data. In order to compute averages 
over states in the same equivalence class, we keep a running 
total and a count of the Q values in each equivalence class. 
Thus, the reinforcement learning algorithm just makes a sin- 
gle pass over the training data, which bodes well for scaling 
the algorithm to much larger data sets. 

The output of Q-learning is the learned policy, which de- 
termines whether to buy or wait in unseen states by mapping 
them to the appropriate equivalence class and choosing the 
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action with the lowest learned cost. 

4.3 Time Series 

Time series analysis is a large and diverse subfield of 
statistics whose goal is to detect and predict trends. In this 
paper, we investigated a first order moving average model. 
At time step £, the model predicts the price one step into the 
future, based on a weighted average of prices already 
seen. Thus, whereas Q- learning and Ripper attempt to gen- 
eralize from the behavior of a set of flights in the training 
data to the behavior of future flights, the moving average 
model attempts to predict the price behavior of a flight in 
the test data based on its own history. 

At time i, we predict the next price using a fixed window 
of price observations, pt-k+i, • • • , Pi* (In HAMLET, we found 
that setting k to one week's worth of price observations was 
locally optimal.) We take a weighted average of these prices, 
weighting the more recent prices more and more heavily. 
Formally, we predict that p t +i will be 

where a(i) is some increasing function of i. We experi- 
mented with different a functions and chose a simple linearly 
increasing function. 

Given the time series prediction, HAMLET relies on the 
following simple decision rule: if the model predicts that 
pt+i > Pti then buy, otherwise wait. Thus, our time series 
model makes its decisions based on a one-step prediction 
of the ticket price change. The decision rule ignores the 
magnitude of the difference between pt+i and pt, which is 
overly simplistic, and indeed the time series prediction does 
not do very well on its own (see Table 3). However, Hamlet 
uses the time series predictions extensively in its rules. In 
effect, the time series prediction provides information about 
how the current price compares to a local average, and that 
turns out to be valuable information for Hamlet. 

4.4 Stacked Generalization 

Ensemble-based learning techniques such as bagging [5], 
boosting [12], and stacking [26, 28], which combine the re- 
sults of multiple generalizes, have been shown to improve 
generalizer accuracy on many data sets. In our study, we 
investigated multiple data mining methods with very differ- 
ent characteristics (Ripper, Q-leaniing, and time series) so 
it makes sense to combine their outputs. 

We preferred stacking to voting algorithms such as 
weighted majority [16] or bagging [5] because we believed 
that there were identifiable conditions under which one 
method's model would be more successful than another. See, 
for example, the sample rule in Figure 6. 

Standard stacking methods separate the original vec- 
tor representation of training examples (level- 0 data in 
Wolpert's terminology), and use the class labels from each 
level-0 generalizer, along with the example's true classifi- 
cation as input to a meta- level (or level- 1) generalizer. To 
avoid over-fitting, "care is taken to ensure that the mod- 
els are formed from a batch of training data that does not 
include the instance in question" [26]. 

In our implementation of stacking, we collapsed level-0 
and level- 1 features. Specifically, we used the feature repre- 
sentation described in Section 4.1 but added three additional 
features corresponding to the class labels (buy or wait) coin- 



Let TS be the output of the Time Series algorithm, 
and let QL be the output of Q-Learning. 

LF hours-before- takeoff >= 480 AND airline = United 
AND price >= 360 AND TS = buy AND QL = wait 
THEN wait 

Figure 6: A sample rule generated by Hamlet. 

puted for each training example by our level-0 generalizers. 
To add our three level- 1 features to the data, we applied 
the model produced by each base-level generalizer (Ripper, 
Q-learning, and time series) to each instance in the training 
data and labeled it with 'buy' or 'wait*. Thus, we added 
features of the form TS = buy (time series says to buy) and 
QL — wait (Q-learning says to wait). 

We then used Ripper as our level- 1 generalizer, running 
it over this augmented training data. We omitted leave- 
one-out cross validation because of the temporal nature of 
our data. Although a form of cross validation is possible 
on temporal data, it was not necessary because each of our 
base learners did not appear to overfit the training data. 

Our stacked generalizer was our most successful data min- 
ing method as shown in Table 3 and we refer to it as Ham- 
let. 

4.5 Hand-Crafted Rule 

After we studied the data in depth and consulted with 
travel agents, we were able to come up witli a fairly simple 
policy "by hand" . We describe it below, and include it in our 
results as a baseline for comparison with the more complex 
models produced by our data mining algorithms. 

The intuition underlying the hand-crafted rules is as fol- 
lows. First, to avoid sell outs we do not want to wait too 
long. By inspection of the data, we decided to buy if the 
price has not dropped within 7 days of the departure date. 
We can compute an expectation for the lowest price of the 
flight in the future based on similar flights in the training 
data. s If the current price is higher than the expected min- 
imum then it is best to wait. Otherwise, we buy. 

More formally, let MinPrice(s,t) of a flight in the train- 
ing set denote the minimum price of that flight over the 
interval starting from s days before departure up until 
time t (or until the flight sells out). Let ExpPrice(s i t) 
for a particular flight number denote the average over all 
MinPrice(s> t) for flights in the training set with that flight 
number. Suppose a passenger asks at time t n to buy a ticket 
that leaves in sq days, and whose current price is Cur Price. 
The hand-crafted rule is shown in Figure 7. 

IF ExpPrice(s l)i t 0 ) < CurPrice 
AND s 0 > 7 days THEN wait 
ELSE buy 

Figure 7: Hand-crafted rule for deciding whether to 
wait or buy. 

We also considered simpler decision rules of the form "if 
the current time is less than K days before the flight's de- 
parture then buy." In our simulation (described below) we 

s For "similar" flights we used flights with the same airline 
and flight number. 



tested such rules for K ranging from 1 to 22, but none of 
these rules resulted in savings and some resulted in sub- 
stantial losses. 

5. EXPERIMENTAL RESULTS 

In this section we describe the simulation we used to as- 
sess the savings due to each of the data mining methods 
described earlier. We then compare the methods in Table 3, 
perform a sensitivity analysis of the comparison along sev- 
eral dimensions, and consider the implications of our pilot 
study. 

5.1 Ticket Purchasing Simulation 

The most natural way to assess the quality of the predic- 
tive models generated by the data mining methods described 
in Section 4 is to quantify the savings that each model would 
generate for a population of passengers. For us, a passenger 
is a person wanting to buy a ticket on a particular flight 
at a particular date and time. It is easy to imagine that 
an online travel agent such as Expedia or Travelocity could 
offer discounted fares to passengers on its web site, and use 
Hamlet to appropriately time ticket purchases behind the 
scenes. For example, if HAMLET anticipates that a fare will 
drop by $500, the agent could offer a $300 discount and keep 
$200 as compensation and to offset losses due prediction er- 
rors by Hamlet. 

Since Hamlet is not yet ready for use by real passengers, 
we simulated passengers by generating a uniform distribu- 
tion of passengers wanting to purchase tickets on various 
flights as a function of time. Specifically, the simulation 
generated one passenger for each fare observation in our set 
of test data. The total number of passengers was 4,488. 
Thus, each simulated passenger has a particular flight for 
which they need to buy a ticket and an earliest time point 
at which they could purchase that ticket (called the "earliest 
purchase point"). The earliest purchase points, for different 
simulated passengers, varied from 21 days before the flight 
to the day of the flight. 

At each subsequent time point, Hamlet decides whether 
to buy a ticket immediately or to wait. This process con- 
tinues until either the passenger buys a ticket or economy 
seats on the flight sell out, in which case Hamlet will buy 
a higher priced business-class ticket for the flight. 6 We de- 
fined upgrade costs as the difference between the cost of a 
business class ticket and the cost of an economy ticket at 
the earliest purchase point. In our simulation, Hamlet was 
forced to "upgrade" passengers to business class only 0.42% 
of the time, but the total cost of these upgrades was quite 
high ($38,743 in Table 3). 7 

We recorded for each simulated passenger, and for each 
predictive model considered, the price of the ticket pur- 
chased and the optimal price for that passenger given their 
earliest time point and the subsequent price behavior for 
that flight. The savings (or loss) that a predictive model 
yields for a simulated passenger is the difference between the 
price of a ticket at the earliest purchase point and the price 

6 It , s possible, of course, for business class to sell out as well, 
in which case Hamlet would have to buy a first-class ticket 
or re-book the passenger on a different flight. However, busi- 
ness class did not sell out in our simulation. 
7 Since we did not collect upgrade costs for all flights, our 
upgrade costs are approximate but always positive and often 
as high as $1,000 or more. 



of the ticket at the point when the predictive model recom- 
mends buying. Net savings is savings net of both losses and 
upgrade costs. 

5.2 Savings 

Table 3 shows the savings, losses, upgrade costs, and net 
savings achieved in our simulation by each predictive model 
we generated. We also report on the frequency of upgrades 
as a percentage of the total passenger population, the net 
savings as a percent of the total ticket price, and the perfor- 
mance of each model as a percent of the maximal possible 
savings. 

The models we used are the following: 

• Optimal: This model represents the maximal possi- 
ble savings, which are computed by a "clairvoyant" al- 
gorithm with perfect information about future prices, 
and which obtained the best possible purchase price 
for each passenger. 

• By hand: This model was hand-crafted by one of 
the authors after consulting with travel agents and 
throughly analyzing our training data (see Figure 7). 

• Time series: This model was generated by the mov- 
ing average method described earlier. 

• Ripper: This model was generated by Ripper. 

• Q- learning: This model was generated by our Q- 
learning method. 

• Hamlet: This model was generated by our stacking 
generalizer which combined the results of Ripper, Q- 
learning, and Time series. 

Table 3 shows a comparison of the different methods. Note 
that the savings measure we focus on is savings net of losses 
and upgrade costs. We see that Hamlet outperformed each 
of the learning methods as well as the hand-crafted model 
to achieve a net savings of $198,074. Furthermore, despite 
the fact that HAMLET had access to a very limited price 
history and no information about the number of unsold seats 
on the flight, its net savings were a remarkable 61.8% of 
optimal. Finally, while an average net savings of 4.4% may 
not seem like much, passengers spend billions of dollars on 
air travel each year so 4.4% amounts to a substantial number 
of dollars. 

We believe that our simulation understates the savings 
that Hamlet would achieve in practice. For close to 75% of 
the passengers in our test set, savings were not possible be- 
cause prices never dropped from the earliest purchase point 
until the flight departed. We report the percent savings in 
ticket prices over the set of flights where savings was possible 
("feasible flights") in Table 4. These savings figures are of 
interest because of the unrealistic distribution of passengers 
in our simulation. Because we only gathered data for 21 
days before each flight in our test set, passengers "arrived" 
at most 21 days before a flight. Furthermore, due to the 
uniform distribution of passengers, 33% of the passengers 
arrived at most 7 days before the flight's departure, when 
savings are hard to come by. In fact, on our test data, Ham- 
let lost money for passengers who "arrived" in the last 7 
days prior to the flight. We believe that in practice we would 
find additional opportunities to save money for the bulk of 
passengers who buy their tickets more than 7 days before 
the flight date. 
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Method 


Savings 


Losses 


Upgrade Cost 


% Upgrades 


Net Savings 


% Savings 


% of Optimal 


Optimal 


$320,572 


$0 


$0 


0% 


$320,572 


7.0% 


100% 


By hand 


$228,318 


$35,329 


$22,472 


0.36% 


$170,517 


3.8% 


53.2% 


Ripper 


$211,031 


$4,689 


$33,340 


0.45% 


$173,002 


3.8% 


54.0% 


Time Series 


$269,879 


$6,138 


$693,105 


33.0% 


-$429,364 


-9.5% 


-134% 


Q-learning 


$228,663 


$46,873 


$29,444 


0.49% 


$152,364 


3.4% 


47.5% 


Hamlet 


$244,868 


$8,051 


$38,743 


0.42% 


$198,074 


4.4% 


61.8% 



Table 3: Savings by Method. 



Method 


Net Savings 


Optimal 


30.6% 


By hand 


21.8% 


Ripper 


20.1% 


Time Series 


25.8% 


Q-learning 


21.8% 


Hamlet 


23.8% | 



Table 4: Comparison of Net Savings (as a percent 
of total ticket price) on Feasible Flights. 

5.3 Sensitivity Analysis 

To test the robustness of our results to changes in our sim- 
ulation, we varied two key parameters. First, we changed 
the distribution of passengers requesting flight tickets. Sec- 
ond, we changed the model of a passenger from one where 
a passenger wants to purchase a ticket on a particular flight 
to one where a passenger wants to fly at any time during 
a three hour interval. The interval model is similar to the 
interface offered at many travel web sites where a potential 
buyer specifies if they want to fly in the morning, afternoon, 
or evening. 

We used the following distributions to model the earliest 
purchase point (i.e., the first time point at which passengers 
"arrive 55 and need to decide whether to buy a ticket or to 
wait): 

• Uniform: a uniform distribution of simulated pas- 
sengers over the 21 days before the flight's departure 
date; 

• Linear Decrease: a distribution in which the number 
of passengers arriving at the system decreased linearly 
as the amount of time left before departure decreased; 

• Quadratic Decrease: a distribution like Linear De- 
crease, but with a quadratic relationship; 

• Square Root Decrease: a distribution like Linear 
Decrease, but with a square root relationship; 

• Linear Increase: a distribution like Linear Decrease, 
except that the number of passengers increase as the 
amount of time left before departure decreased; 

• Quadratic Increase: a distribution like Linear In- 
crease, but with a quadratic relationship; 

• Square Root Increase: a distribution like Linear 
Increase, but with a square root relationship. 

Table 6 reports the net savings, as a percentage of the to- 
tal ticket price, under the different distributions. Hamlet 
saved more than 2.5% of the ticket price in all cases, and 



it saved more than any other method on all distributions 
except the Quadratic Decrease distribution, where it per- 
formed slightly worse than the hand-crafted decision rule. 
Hamlet's savings were above 38% of optimal in all cases. 

Table 5 reports on the performance of the different meth- 
ods under the modified model where a passenger requests a 
ticket on a non-stop flight that departs at any time during a 
particular three hour interval (e.g., morning). This different 
model does not change our results qualitatively. Hamlet 
still achieves a substantial percentage of the optimal sav- 
ings (59.2%) and its percentage of upgrades drops to only 
0.1%. Finally, Hamlet still substantially outperforms the 
other data mining methods. 



Method 


Net Savings 


% of Optimal 


% upgrades 


Optimal 


$323,802 


100% 


0% 


By hand 


$163,523 


55.5% 


0% 


Ripper 


$173,234 


53.5% 


0% 


Time Series 


-$262,749 


-81.1% 


6.3% 


Q-Learning 


$149,587 


46.2% 


0.2% 


Hamlet 


$191,647 


59.2% 


0.1% 



Table 5: Performance of algorithms on multiple 
flights over three hour interval. 

Overall, our analysis confirms that Hamlet's perfor- 
mance on the test data is robust to the parameters we varied. 

6. FUTURE WORK 

There are several promising directions for future work on 
price mining. We plan to perform a more comprehensive 
study on airline pricing with data collected over a longer 
period of time and over more routes. We plan to include 
multi-leg flights in this new data set. The pricing behavior 
of multi-leg flights is different than that of non-stop flights 
because each leg in the flight can cause a change in the price, 
and because pricing through airline hubs appears to behave 
differently as well. 

We also plan to exploit other sources of information to 
further improve Hamlet's predictions. We do not currently 
have access to a key variable — the number of unsold seats 
on a flight. However, on-line travel agents and centralized 
reservation systems such as Sabre or Galileo do have this 
information. If we had access to the number of unsold seats 
on a flight, Hamlet could all but eliminate the need to 
upgrade passengers, which is a major cost. 

To use the methods in this paper on the full set of domes- 
tic and international flights on any given day would require 
collecting vast amounts of data. One possible way to address 
this problem is to build agents on demand that collect the 
required data to make price predictions for on a particular 
future flight on a particular day. The agents would still need 



Distribution 


By hand 


Q- Learn 


Time Series 


Ripper 


Hamlet 


Quadratic Decrease 


4.07% 


3.77% 


-24.96% 


2.46% 


3.96% 


Linear Decrease 


4.70% 


4.30% 


-26.76% 


4.13% 


5.17% 


Sqrt Decrease 


4.47% 


4.04% 


-29.05% 


4.23% 


5.03% 


Uniform 


3.77% 


3.37% 


-32.55% 


3.83% 


4.38% 


Sqrt Increase 


3.66% 


3.24% 


-34.63% 


4.05% 


4.39% 


Linear Increase 


3.13% 


2.72% 


-36.55% 


3.62% 


3.85% 


Quadratic Increase 


2.10% 


1.74% 


-39.90% 


2.48% 


2.60% 



Table 6: Sensitivity of Methods to Distribution of Passengers' Earliest Purchase Points. The numbers 
reported are the savings, as a percentage of total ticket price, achieved by each algorithm under each distri- 
bution. We see that Hamlet outperforms Q-learning, time series, and Ripper on all distributions. 



to collect data for multiple flights, but the amount of data 
would be much smaller. This type of agent would fit well 
within the Electric Elves system [6, 2], which deploys a set 
of personalized agents to monitor various aspects of a trip. 
For example, Elves can notify you if your flight is delayed 
or canceled or let you know if there is an earlier connecting 
flight to your destination. 

Beyond airline pricing, we believe that the techniques de- 
scribed in this paper will apply to other product categories. 
In the travel industry, hotels and car rental agencies employ 
many of the same pricing strategies as the airlines and it 
would be interesting to see how much HAMLET can save in 
these product categories. Similarly, online shopping sites 
such as Amazon and Wal-mart are beginning to explore 
more sophisticated pricing strategies and Hamlet will al- 
low consumers to make more informed decisions. Finally, 
reverse auction sites, such as half.com, also provide an op- 
portunity for Hamlet to learn about pricing over time and 
make recommendations about purchasing an item right away 
or waiting to buy it. In general, price mining over time pro- 
vides a new dimension for comparison shopping engines to 
exploit. 

We recognize that if a progeny of Hamlet would achieve 
wide spread use it could start to impact the airlines' (already 
slim) profit margins. Could the airlines introduce noise into 
their pricing patterns in an attempt to fool a price miner? 
While we have not studied this question in depth, the ob- 
vious problem is that changing fares on a flight in order to 
fool a price miner would impact all consumers considering 
buying tickets on that flight. If the price of a ticket moves 
up substantially, then consumers are likely to buy tickets 
on different flights resulting in a revenue loss for the airline. 
Similarly, if the price moves down substantially, consumers 
will be buying tickets at a discount resulting in a revenue 
loss again. Thus, to avoid these distortions, the airlines are 
forced to show the prices that they actually want to charge 
for tickets. Of course, there are more prosaic methods of 
trying to block a price miner such as placing prices inside 
GIF files or blocking the IP address of the price miner. How- 
ever, an "industrial strength" price miner would not rely on 
"scraping" information from web sites, but would access a 
fare database directly. 

7. CONCLUSION 

This paper reported on a pilot study in "price mining" 
over the web. We gathered airfare data from the web and 
showed that it is feasible to predict price changes for flights 
based on historical fare data. Despite the complex algo- 
rithms used by the airlines, and the absence of informa- 



tion on key variables such as the number of seats available 
on a flight, our data mining algorithms performed surpris- 
ingly well. Most notably, our Hamlet data mining method 
achieved 61.8% of the possible savings by appropriately tim- 
ing ticket purchases. 

Our algorithms were drawn from statistics (time series 
methods), computational finance (reinforcement learning) 
and classical machine learning (Ripper rule learning). Each 
algorithm was tailored to the problem at hand (e.g., we 
devised an appropriate reward function for reinforcement 
learning), and the algorithms were combined using a vari- 
ant of stacking to improve their predictive accuracy. 

Additional experiments on larger airfare data sets and in 
other domains (e.g., hotels, reverse auctions) are essential, 
but this initial pilot study provides the first demonstration of 
the potential of price mining algorithms to save consumers 
substantial amounts of money using data available on the 
Internet. We believe that price mining of this sort is a fertile 
area for future research. 
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